Bar Graph for Multiple Genes given from RNA Sequencing


In [33]:
import numpy as np
import matplotlib.pyplot as plt
import csv
aortaData = [] 
aortaDataNumbers = []
cerebellumData = []
cerebellumDataNumbers= []
arteryData = []
arteryDataNumbers = []
with open ("genomicdata.csv") as csvfile:
    readCSV = csv.reader(csvfile, delimiter= '\t') #gives access to the CSV file

    for col in readCSV:
        if col[7] == '':
            aortaData.append('0')
        else:
            aortaData.append(col[7])
        if col[13] == '': #cerebellum
            cerebellumData.append('0')
        else:
            cerebellumData.append(col[13])
        if col[15] == '': #coronary artery
            arteryData.append('0')
        else:
            arteryData.append(col[15])
    aortaDataNumbers = list(map(float, aortaData[19:24]))
    cerebellumDataNumbers = list(map(float, cerebellumData[19:24]))
    arteryDataNumbers = list(map(float, arteryData[19:24]))
    
ind = np.arange(len(aortaDataNumbers))  # the x locations for the groups
width = 0.35       # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(ind, aortaDataNumbers, width, color='r') #creates rectangles
rects2 = ax.bar(ind+width, cerebellumDataNumbers, width, color='g') #creates rectangles
rects3 = ax.bar(ind+width*2, arteryDataNumbers, width, color='b') #creates rectangles

ax.set_ylabel('Expression Vector') #Y axis label
ax.set_title('Gene Expressed') #X axis label
ax.set_xticks(ind + width) #the distance between each bar


# ax.legend((rects1[0]), ('Expression Vector of Each Gene Expressed in the Aorta')) #Creates a legend so people know
#what they are looking at


def autolabel(rects): #creates a different label for each bar to show the height
    for rect in rects:
        height = rect.get_height() #height of each bar
        ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%d' % int(height),
                ha='center', va='bottom') #gives the value 


plt.show()


In this code, I combined what I worked on last week (creating a gene expression bar chart) with the task for this week (creating a gene expression bar chart for multiple genes). The three tissues I chose were the cerebellum, the coronary artery, and the aorta. I originally decided to do a bar chart with two tissues: the cerebellum and the coronary artery. I selected a random set of genes (19-24) to see how tissues differ. After graphing these genes for the cerebellum and the coronary artery, that the expression vectors for all the genes minus the last one (ENSG00000002549 LAP3), the expressions were relatively the same. For ENSG00000002549 LAP3 however, the the coronary artery had signifcantly higher expression for the gene in comparison to the cerebellum. Because of this, I decided to add in the gene expression vectors for the same set of genes for the aorta since the aorta and the coronary artery are both a part of the cardiovascular system. Like I predicted, the aorta also has a significantly higher expression vector for this gene.


In [37]:
import numpy as np
import matplotlib.pyplot as plt
import csv
aortaData = [] 
aortaDataNumbers = []
cerebellumData = []
cerebellumDataNumbers= []
arteryData = []
arteryDataNumbers = []
with open ("genomicdata.csv") as csvfile:
    readCSV = csv.reader(csvfile, delimiter= '\t') #gives access to the CSV file

    for col in readCSV:
        if col[7] == '':
            aortaData.append('0')
        else:
            aortaData.append(col[7])
        if col[13] == '': #cerebellum
            cerebellumData.append('0')
        else:
            cerebellumData.append(col[13])
        if col[15] == '': #coronary artery
            arteryData.append('0')
        else:
            arteryData.append(col[15])
    aortaDataNumbers = list(map(float, aortaData[73:80]))
    cerebellumDataNumbers = list(map(float, cerebellumData[73:80]))
    arteryDataNumbers = list(map(float, arteryData[73:80]))
    
ind = np.arange(len(aortaDataNumbers))  # the x locations for the groups
width = 0.35       # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(ind, aortaDataNumbers, width, color='r') #creates rectangles
rects2 = ax.bar(ind+width, cerebellumDataNumbers, width, color='g') #creates rectangles
rects3 = ax.bar(ind+width*2, arteryDataNumbers, width, color='b') #creates rectangles

ax.set_ylabel('Expression Vector') #Y axis label
ax.set_title('Gene Expressed') #X axis label
ax.set_xticks(ind + width) #the distance between each bar



def autolabel(rects): #creates a different label for each bar to show the height
    for rect in rects:
        height = rect.get_height() #height of each bar
        ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%d' % int(height),
                ha='center', va='bottom') #gives the value 


plt.show()


In this graph, along with upcoming graphs, I chose a gene that the cerebellum had a significantly high expression for. This gene, number 75 in the array, proved my previous prediction wrong since the cerebellum, the coronary artery, and the aorta all showed significantly higher gene expressions for this gene.


In [47]:
import numpy as np
import matplotlib.pyplot as plt
import csv
aortaData = [] 
aortaDataNumbers = []
cerebellumData = []
cerebellumDataNumbers= []
arteryData = []
arteryDataNumbers = []
with open ("genomicdata.csv") as csvfile:
    readCSV = csv.reader(csvfile, delimiter= '\t') #gives access to the CSV file

    for col in readCSV:
        if col[7] == '':
            aortaData.append('0')
        else:
            aortaData.append(col[7])
        if col[13] == '': #cerebellum
            cerebellumData.append('0')
        else:
            cerebellumData.append(col[13])
        if col[15] == '': #coronary artery
            arteryData.append('0')
        else:
            arteryData.append(col[15])
    aortaDataNumbers = list(map(float, aortaData[722:727]))
    cerebellumDataNumbers = list(map(float, cerebellumData[722:727]))
    arteryDataNumbers = list(map(float, arteryData[722:727]))
    

ind = np.arange(len(aortaDataNumbers))  # the x locations for the groups
width = 0.35       # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(ind, aortaDataNumbers, width, color='r') #creates rectangles for aorta
rects2 = ax.bar(ind+width, cerebellumDataNumbers, width, color='g') #creates rectangles for cerebellum
rects3 = ax.bar(ind+width*2, arteryDataNumbers, width, color='b') #creates rectangles for coronary artery

ax.set_ylabel('Expression Vector') #Y axis label
ax.set_title('Gene Expressed') #X axis label
ax.set_xticks(ind + width) #the distance between each bar


def autolabel(rects): #creates a different label for each bar to show the height
    for rect in rects:
        height = rect.get_height() #height of each bar
        ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%d' % int(height),
                ha='center', va='bottom') #gives the value 


plt.show()


This graph shows two different things.

The first is what I was expecting in the previous graph. For the gene in spot 722 in the array, the Expression vector for the cerebellum is high, whereas there is no expression vector for the gene in both the coronary artery and for the aorta.

The second is something that I thought of after the results given in the previous graph. For the gene in spot 725 in the array, the gene expression vector for the aorta is significantly higher than the gene expression vector for the coronary artery.


In [ ]: